A System to Mine Large-Scale Bilingual Dictionaries from Monolingual Web Pages
نویسندگان
چکیده
This paper describes a system that automatically mines EnglishChinese translation pairs from large amount of monolingual Chinese web pages. Our approach is motivated by the observation that many Chinese terms (e.g., named entities that are not stored in a conventional dictionary) are accompanied by their English translations in the Chinese web pages. In our approach, candidate translations are extracted using pre-defined templates. Transliterations and translation pairs are then identified using statistical learning methods. We compare several approaches to aligning transliterations and mining translations on more than 300GB Chinese web pages. In our experiments on MSN query log, we show that the mined bilingual dictionary greatly enlarges the coverage of an existing English-Chinese dictionary. It also improves query translation in cross-language information retrieval, leading to significantly higher retrieval effectiveness in on TREC collections.
منابع مشابه
Modeling Monolingual And Bilingual Collocation Dictionaries In Description Logics
This paper discusses an approach to modeling monolingual and bilingual dictionaries in the description logic species of the OWL Web Ontology Language (OWL DL). The central idea is that the model of a bilingual dictionary is a combination of the models of two monolingual dictionaries, in addition to an abstract translation model. The paper addresses the advantages of using OWL DL for the design ...
متن کاملOn multiword lexical units and their role in maritime dictionaries
Multi-word lexical units are a typical feature of specialized dictionaries, in particular monolingual and bilingual maritime dictionaries. The paper studies the concept of the multi-word lexical unit and considers the similarities and differences of their selection and presentation in monolingual and bilingual maritime dictionaries. The work analyses such issues as the classification of multi-w...
متن کاملWord etymology in monolingual and bilingual dictionaries: lexicographers2 versus EFL learners2 perspectives
This paper deals with the treatment of word etymology in monolingual and bilingual dictionaries. It also investigates EFL learners' attitudes towards the importance of etymology for understanding the meaning of the words they look up in dictionaries. The data were collected through tasks of looking up Arabic loan words in English in monolingual and bilingual dictionaries. The results indicate t...
متن کاملMachine Translation Detection from Monolingual Web-Text
We propose a method for automatically detecting low-quality Web-text translated by statistical machine translation (SMT) systems. We focus on the phrase salad phenomenon that is observed in existing SMT results and propose a set of computationally inexpensive features to effectively detect such machine-translated sentences from a large-scale Web-mined text. Unlike previous approaches that requi...
متن کاملMT Express
The machine translation systems that are being developed at CRL are designed for assimilation purposes and are targeted at a large variety of source texts, including news articles, Web pages, newsgroups articles and email traffic. Thus, coverage and robustness are emphasized over depth of analysis, and accuracy over stylistic fluidity. Moreover, these systems are for the most part developed und...
متن کامل